Search CORE

15 research outputs found

diBELLA: Distributed Long Read to Long Read Alignment

Author: Buluç Aydın
Ellis Marquita
Guidi Giulia
Oliker Leonid
Yelick Katherine
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/01/2020
Field of study

We present a parallel algorithm and scalable implementation for genome analysis, specifically the problem of finding overlaps and alignments for data from "third generation" long read sequencers. While long sequences of DNA offer enormous advantages for biological analysis and insight, current long read sequencing instruments have high error rates and therefore require different approaches to analysis than their short read counterparts. Our work focuses on an efficient distributed-memory parallelization of an accurate single-node algorithm for overlapping and aligning long reads. We achieve scalability of this irregular algorithm by addressing the competing issues of increasing parallelism, minimizing communication, constraining the memory footprint, and ensuring good load balance. The resulting application, diBELLA, is the first distributed memory overlapper and aligner specifically designed for long reads and parallel scalability. We describe and present analyses for high level design trade-offs and conduct an extensive empirical analysis that compares performance characteristics across state-of-the-art HPC systems as well as a commercial cloud architectures, highlighting the advantages of state-of-the-art network technologies.Comment: This is the authors' preprint of the article that appears in the proceedings of ICPP 2019, the 48th International Conference on Parallel Processin

arXiv.org e-Print Archive

Crossref

10 Years Later: Cloud Computing is Closing the Performance Gap

Author: Buluc Aydin
Culler David
Ellis Marquita
Guidi Giulia
Yelick Katherine
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

Can cloud computing infrastructures provide HPC-competitive performance for scientific applications broadly? Despite prolific related literature, this question remains open. Answers are crucial for designing future systems and democratizing high-performance computing. We present a multi-level approach to investigate the performance gap between HPC and cloud computing, isolating different variables that contribute to this gap. Our experiments are divided into (i) hardware and system microbenchmarks and (ii) user application proxies. The results show that today's high-end cloud computing can deliver HPC-competitive performance not only for computationally intensive applications but also for memory- and communication-intensive applications - at least at modest scales - thanks to the high-speed memory systems and interconnects and dedicated batch scheduling now available on some cloud platforms

arXiv.org e-Print Archive

eScholarship - University of California

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment

Author: Buluç Aydın
Ding Nan
Ellis Marquita
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Santambrogio Marco D.
Yelick Katherine
Zeni Alberto
Publication venue
Publication date: 01/01/2020
Field of study

Pairwise sequence alignment is one of the most computationally intensive kernels in genomic data analysis, accounting for more than 90% of the runtime for key bioinformatics applications. This method is particularly expensive for third-generation sequences due to the high computational cost of analyzing sequences of length between 1Kb and 1Mb. Given the quadratic overhead of exact pairwise algorithms for long alignments, the community primarily relies on approximate algorithms that search only for high-quality alignments and stop early when one is not found. In this work, we present the first GPU optimization of the popular X-drop alignment algorithm, that we named LOGAN. Results show that our high-performance multi-GPU implementation achieves up to 181.6 GCUPS and speed-ups up to 6.6x and 30.7x using 1 and 6 NVIDIA Tesla V100, respectively, over the state-of-the-art software running on two IBM Power9 processors using 168 CPU threads, with equivalent accuracy. We also demonstrate a 2.3x LOGAN speed-up versus ksw2, a state-of-art vectorized algorithm for sequence alignment implemented in minimap2, a long-read mapping software. To highlight the impact of our work on a real-world application, we couple LOGAN with a many-to-many long-read alignment software called BELLA, and demonstrate that our implementation improves the overall BELLA runtime by up to 10.6x. Finally, we adapt the Roofline model for LOGAN and demonstrate that our implementation is near-optimal on the NVIDIA Tesla V100s

arXiv.org e-Print Archive

eScholarship - University of California

New Generation of Educators Initiative: Transforming teacher preparation.

Author: Al Schademan
Annie Ransom
Brandon Ware
BreAnna Evans Santiago
Carolee Hurtado
Cathy Yun
Corin Slown
Hallie Yopp Slowik
Heather Horsley
Holly Gonzales
Jana Luft
Joan Bissel
Kara Naidoo
Lisa L. Isbell
Macy Parker
Mark Ellis
Marquita Grenot-Scheyer
Megan Guise
Megan J. Sulsberger
Michelle Dean
Mimi Miller
Noelle Won
Pia Wong
Sarah Hegg
Sue Baker
Tanya Flushman
Publication venue: California State University System
Publication date: 04/04/2020
Field of study

The focus of the New Generation of Educators Initiative (NGEI) was to answer the question "What would it take to transform teacher education?" From 2016 to 2019, with support from the S. D. Bechtel, Jr. Foundation, teacher education programs at 10 California State University (CSU) campuses partnered with local school districts to design and demonstrate innovative practices that could transform teacher preparation. This report documents the learnings from multiple participants in this transformative work, including Foundation program staff and representatives from partnerships between universities and school districts

IssueLab

Asynchrony versus bulk-synchrony for a generalized N-body problem from genomics

Author: Ellis Marquita,
Publication venue
Publication date: 11/06/2022
Field of study

Ezid

diBELLA

Author: Ellis Marquita,
Publication venue
Publication date: 06/01/2020
Field of study

Ezid

Recommended from our members

Parallelizing Irregular Applications for Distributed Memory Scalability: Case Studies from Genomics

Author: Ellis Marquita May
Publication venue: 'California Digital Library (CDL)'
Publication date: 01/01/2020
Field of study

Generalizable approaches, models, and frameworks for irregular application scalability is an old yet open area in parallel and distributed computing research. Irregular applications are particularly hard to parallelize and distribute because, by definition, the pattern of computation is dependent upon the input data. With the proliferation of data-driven and data-intensive applications from the realm of Big Data, and the increasing demand for and availability of large-scale computing resources through HPC-Cloud convergence, the importance of generalized approaches to achieving irregular application scalability is only growing. Rather than offering another software language or framework, this dissertation argues we first need to understand application scalability, especially irregular application scalability, and more closely examine patterns of computation, data sharing, and dependencies. As it stands, predominant performance models and tools from parallel and distributed computing focus on applications that are divided into distinct communication and computation phases, and ignore issues related to memory utilization. While time-tested and valuable, these models are not always sufficient for understanding full application scalability, particularly, the scalability of data-intensive irregular applications. We present application case studies from genomics, highlighting the interdependencies of communication, computation, and memory capacities and performance. The genomics applications we will examine offer a particularly useful and practical vantage point for this analysis, as they are data-intensive irregular application targets for both HPC and cloud computing. Further, they present an extreme for both domains. For HPC, they are less akin to traditional, well-studied and well-supported scientific simulations and more akin to text and document analysis applications. For cloud computing, they are an extreme in that they require frequent random global access to memory and data, stressing interconnection network latency and bandwidth and co-scheduled processors for tightly orchestrated computation. We show how common patterns of irregular all-to-all computation can be managed efficiently, comparing bulk-synchronous approaches built on collective communication and asynchronous approaches based on one-sided communication. For the former, our work is based on the popular Message Passing Interface (MPI) and makes heavy use of globally collective communication operations that exchange data across processors in a single step or, to save memory use, in a set of irregular steps. For the latter, we build on the UPC++ programming framework, which provides lightweight RPC mechanisms, to transfer both data and computational work between processors. We present performance results across multiple platforms including several modern HPC systems and, at least in one case, a cloud computing platform. With these application case studies, we seek not only to contribute to discussions around parallel algorithm and data structure design, programming systems, and performance modeling within the parallel computing community, but also to contribute to broader work in genomics through software development and analysis. Thus, we develop and present the first distributed memory scalable software for analyzing data sets from the latest generation of sequencing technologies, known as long read data sets. Specifically, we present scalable solutions to the problem of many-to-many long read overlap and alignment, the computational bottleneck to long read assembly, error correction, and direct analysis. Through cross-architectural empirical analysis, we identify the key components to efficient scalability, and highlight the priorities for any future optimization with analytical models

eScholarship - University of California

ProQuest OAI Repository

Recommended from our members

Parallelizing Irregular Applications for Distributed Memory Scalability: Case Studies from Genomics

Author: Ellis Marquita May
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

eScholarship - University of California

Recommended from our members

Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly

Author: Buluc Aydin
Ellis Marquita
Guidi Giulia
Oliker Leonid
Selvitopi Oguz
Yelick Katherine
Publication venue: eScholarship, University of California
Publication date: 19/10/2020
Field of study

One of the most computationally intensive tasks in computational biology is de novo genome assembly, the decoding of the sequence of an unknown genome from redundant and erroneous short sequences. A common assembly paradigm identifies overlapping sequences, simplifies their layout, and creates consensus. Despite many algorithms developed in the literature, the efficient assembly of large genomes is still an open problem. In this work, we introduce new distributed-memory parallel algorithms for overlap detection and layout simplification steps of de novo genome assembly, and implement them in the diBELLA 2D pipeline. Our distributed memory algorithms for both overlap detection and layout simplification are based on linear-algebra operations over semirings using 2D distributed sparse matrices. Our layout step consists of performing a transitive reduction from the overlap graph to a string graph. We provide a detailed communication analysis of the main stages of our new algorithms. diBELLA 2D achieves near linear scaling with over 80% parallel efficiency for the human genome, reducing the runtime for overlap detection by 1.2-1.3x for the human genome and 1.5-1.9x for C. elegans compared to the state-of-the-art. Our transitive reduction algorithm outperforms an existing distributed-memory implementation by 10.5-13.3x for the human genome and 18-29x for the C. elegans. Our work paves the way for efficient de novo assembly of large genomes using long reads in distributed memory

eScholarship - University of California